Knowledge Extraction on Multidimensional Concepts: Corpus Pattern Analysis (CPA) and Concordances
نویسندگان
چکیده
Multidimensionality of concepts in multidisciplinary domains is a problem terminographers have to deal with. We apply Corpus Pattern Analysis (CPA; Pustejovsky, Hanks, & Rumshisky, 2004) to extract conceptual dimensions according to context. The dynamic nature of these concepts is exemplified with the case study of SAND. On the other hand, knowledge patterns (KPs) often convey different conceptual relations and are therefore polysemic structures. The development of pattern-based constraints can help to disambiguate them and at the same time avoid conceptual noise, which would be a first step towards the systematization of automatic knowledge extraction. Two KPs are analyzed in detail: rang* from, which conveys the conceptual relation is_a, and the polysemic KP formed by.
منابع مشابه
Mapping WordNet Concepts with CPA Ontology
This paper discusses the enrichment of WordNet data through merging of WordNet concepts and Corpus Pattern Analysis (CPA) semantic types. The 253 CPA semantic types are mapped to the respective WordNet concept. As a result of mapping, the hyponyms of a synset to which a CPA semantic type is mapped inherit not only the respective WordNet semantic primitive but also the CPA semantic type.
متن کاملTailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis
We give a report on a detailed study of automatic lexical disambiguation of 30 sample English verbs. We were drawing on a lexicon of English verb patterns based on the Corpus Pattern Analysis (CPA), which is a novel lexicographic method that seeks to cluster verb uses according to the morpho-syntactic, lexical and semantic/pragmatic similarity of their contexts rather than to associate them wit...
متن کاملAnalyzing the Sense Distribution of Concordances Obtained by Web as Corpus Approach
In corpus-based lexicography and natural language processing fields some authors have proposed using the Internet as a source of corpora for obtaining concordances of words. Most techniques implemented with this method are based on information retrieval-oriented web searchers. However, rankings of concordances obtained by these search engines are not built according to linguistic criteria but t...
متن کاملSoftware and Data for Corpus Pattern Analysis
This report describes the tools and resources developed to support Corpus Pattern Analysis (CPA)—a corpus-based method for building patterns dictionaries. The tools are an annotation of concordance in Sketch Engine, a special CPA editor for editing Pattern Dictionary of English Verbs (PDEV), dedicated servlets based on the Dictionary Editing and Browsing platform and a public interface for brow...
متن کاملSemEval-2015 Task 15: A CPA dictionary-entry-building task
This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs ...
متن کامل